Small Cell Lung Cancer

El Mehdi Baknine, s194533
Jakob Frostholm Højgaard, s194527
Jonathan Dragestad Møller, s184243
Mikkel Niklas Rasmussen, s193518
Thomas Malthe Mølgaard Tams, s204540

2023-11-28

Introduction

Paper and data source

Title: Comprehensive genomic profiles of small cell lung cancer, George J. et. al. (2015)

Loading:
- Dim: 81 x 31669
- 30 metadata
- 31639 gene expression

Data clean:
- Check duplicate IDs
- Clean weird variables
- Check NAs

Data augment:
- 33 metadata
- 400 transcripts

Purpose: Identify different small cell lung cancer profiles

Methods

Load, clean and augment

  • Load in data from two different sheets in an excel file and combine these into a single file
  • Clean the data by creating usable column names and check that NAs exists
  • Augment 3 new variables:
    Survival status - Dead/alive - Treatment type

Methods

Analysis specific methods

  • Select transcripts of interest via Kmeans clustering
  • Heatmapping of expression values - Data exploration
  • Hierarchical clustering of samples - Two groups
  • PCA - Check metadata and identify possible transcripts
  • Logistic regression - Statistically identify transcripts of interest in each group

Overview of metadata

Overview of metadata

Results

Results

Results

Conclusion

  • Confirmation of Study Findings
  • Data Format Consideration
  • Inclusion of Healthy Controls
  • Network Biology Analysis